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PREDICTION  OF  POLYTOMOUS  EVENTS: 

MODEL  DESCRIPTION.  ALGORITHM  DEVELOPMENT 
AND  METHODOLOGICAL  ASPECTS.  WITH  AN  APPLICATION 

I.  O’MUIRCHEARTAIGH 
D.  P.  GAVER 

1 .  Introduction 

The  prediction  of  dichotomous  events  in  meteorology  (fog/no  fog, 
precipitation/no  precipitation)  has  been  widely  studied.  Such  predictions  are 
also  of  interest  in  reliability  and  survival  analysis,  and  in  manpower 
planning.  The  analysis  generally  involves  logistic  regression  or 
(equivalently)  linear  discriminant  functions.  Most  standard  statistical 
packages  (e.g.  BMDP,  SAS)  provide  the  facility  for  performing  this  analysis  in 
some  form.  Also,  the  book  by  Cox  (1970)  can  be  consulted. 

A  natural  extension  of  this  problem  (and  one  which  has  many  potential 
applications  in  meteorology  and  elsewhere)  is  the  situation  in  which  the 
predictand  is  polytomous,  i.e.  has  multiple  categories.  For  example,  it  might 
be  desired  to  predict  visibility  (good/marginal/bad)  or  precipitation 
( none/rain/ snow) .  Two  (methodologically)  distinct  cases  can  be  envisaged, 
viz. 

a.  when  the  predictand  involves  ordered  categories 

b.  when  the  categories  are  unordered. 

Typically,  the  former  case  is  the  more  common  in  meteorological  applications, 
and  this  is  the  problem  addressed  in  this  report. 

The  particular  application  analyzed  here  involves  583  records  of  time  to 
formation  of  tropical  storms,  and  associated  values  of  various  meteorological 


variables.  The  time  to  storm  formation  is  polytomized  and  recorded  as 


1:  storm  formed  within  24  hours 
2:  storm  formed  between  24  and  48  hours 
3:  storm  formed  between  48  and  72  hours 
4-‘  storm  did  not  form 

The  five  meteorological  variables  recorded  were: 

Xj :  unconditional  probability  of  storm  formation  -  a  measure  of  the 
likelihood  of  storm  formation  in  the  particular  disturbance 
location  at  the  given  time  of  year. 

X£:  large  scale  vorticlty  (computed  over  5  Latitude  grid). 

X^:  divergence  (computed  over  5  Latitude  grid). 

X.:  small  scale  vorticlty  .(computed  over  25  Latitude  grid) 

and  X4). 

Our  objective  was  to  determine  how  much  predictive  information  is  provided  by 
the  meteorological  variables  to  facilitate  prediction  of  imminence  of  tropical 
storms.  Essentially  the  problem  involves  regression  models  where  the 
dependent  variable  is  ordinal.  Much  attention  has  been  devoted  to  this 
general  problem  in  the  statistical  literature  of  recent  years  (McCullagh 
[1980],  McCullagh  and  Nelder  [1983],  Green  [1984],  Anderson  [1984]).  The 
central  concept  is  that  of  the  generalized  linear  model  (McCullagh  and  Nelder 
[1983]). 

In  section  2  we  describe  briefly  the  concept  of  generalized  models,  and 
in  more  detail,  the  particular  model  (an  extension  of  [dichotomous]  logistic 
regression)  utilized  for  our  data.  In  section  3  we  summarize  the  results  of 
an  ad  hoc  application  of  the  model  to  our  data,  present  the  relevant  parameter 
estimates,  and  evaluate  the  predictive  performance  of  our  model.  A  general 
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local  generation  of  vorticlty  (product  of  X^ 


discussion  of  our  results  is  presented  In  section  4.  together  with  suggestions 
for  future  work. 


2.  The  Model . 

2.1  General  formulation. 

Following  Green  [1984]  and  McCullagh  and  Nelder  [1983],  we  consider  a  log 
likelihood  L,  a  function  of  an  n-vector,  tj.  of  predictors.  We  postulate,  in 
our  model,  that  the  predictor  tj  is  functionally  dependent  on  the  p-vector  f)  of 
parameters  of  interest.  For  our  particular  application,  tj,  and  p  are 
specified  in  Section  2.2.  The  maximum  likelihood  estimation  of  P  involves 
essentially  solution  of  the  equations 


(1) 


where  u  .  =  — 
nxl 
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using  the  notation  of  Green  [1984].  Again  following  Green,  the  standard 
Newton-Raphson  method  for  the  iterative  solution  of  (1)  involves  evaluating  u. 
D  and  the  second  derivatives  of  L  for  an  Initial  value  of  p.  and  then  solving 
the  linear  equations 


(-%)  (P*  ~  P)  =  DTu  (2) 

&PP 


for  an  upxlated  estimate 


Or  'en  shows  that  this  is  approximately  equivalent 


to  the  solution  of 


(DTAD  )(0*-0)  =  DTu  (3) 

T 

where  A  =  E(^  ($L)  ). 
nxn  v6tj  'otj'  ’ 

given  an  initial  estimate  of  0  (about  which  we  have  some  further  comnents  in 

Section  2.3).  Equation  (3)  can  be  solved  directly  for  0  .  or.  equivalently, 

|| 

if  a  weighted  least  squares  program  is  available,  0  results  from  regression 
A  *u  +  P0  onto  the  columns  of  0  using  weight  matrix  A. 

2.2  Specific  Formulation. 

In  our  application  the  data  are  in  the  form  of  N  multinomial  samples  on 
the  same  set  of  k(=4)  response  categories  (e.g.  categories  1,  2.  3  and  4 
indicating  storm  imminence  as  described  in  Section  1).  The  data  may  be 
arranged  as  a  two-way  table  of  counts  y^,  1*1... N;  j=l ,  ,,,4.  The  log- 
likelihood  L  is  then  given  by 

L  *  \  *  yij  1o*  "u  <4> 

4 

where  p  are  the  cell  probabilities,  and  2  p. .  =  1. 
ij  j=1 

In  the  case  where  the  categories  1,2 . k  are  ordered,  McCullagh  and 

Nelder  [1983]  and  Green  [1984]  both  suggest  the  model 


where  77^  represent  for  fixed  i.  the  cumulative  cell  probabilities,  the  matrix 
(xim)  represents  covariate  information  and  ^  is  a  given  distribution  function. 
Motivation  for  their  model  is  provided  by  considering  the  response  variable  as 
an  arbitrary  grouping  of  an  unobservable  underlying  variable  on  a  continuous 
scale  with  "outpoints”  0^  ...0^.  In  some  applications  the  0  *s  will  be 
unknown  and  will  need  to  be  estimated;  in  others,  such  as  the  present  one. 
they  will  be  known,  because  the  categorical  variable  (storm  imminence  coded 
1,2, 3. 4)  really  is  an  arbitrary  grouping  of  an  underlying  continuous  variable 
based  on  known  cutpoints  (in  this  case  that  variable  is  time  to  storm 
formation  with  cut-points  0^24,  02=48  and  93=72).  For  our  application,  we 
chose  ^  to  be  the  logistic  distribution  function,  viz. , 
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This  is  the  most  widely  used  model  in  applications  and  has  the  advantage  that 
a  simple  transformation  can  be  used  to  (a)  graphically  check  the  suitability 
of  the  model  and  (b)  provide  initial  values  for  the  iterative  estimation 


process . 


Our  model  therefore  is 
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1  +  exp  - 


j  .  im  m 
m=  l 


or,  in  terms  of  previous  notation 


n  =  V(P) 


where  P  =  (p^,...f 3^,Tj , . , .r^) .  Unless  we  Impose  constraints,  ident if  lability 


problems  can  arise.  One  common  expedient  (based  on  the  concept  of  an 


underlying  continuous  variable  used  for  the  classification)  is  to  allow  an 


intercept  and  scale  to  write 


Vf-2,*  i. 

+< - - ) 


i  =  l _ N;  j=l . k-1 . 


where  0’s  are  known. 


A  reformulation,  more  convenient  for  actual  computation 


"u  -  +<P0  *  Bl9j  * 


2.3  Specific  Methodology 


We  now  apply  the  general  methodology  of  Section  2.1  to  the  specific  model 


.  t  ,  4>fr» 


described  in  equation  (9).  This  involves  two  steps 


(a)  deriving  explicit  expressions  for  A.  D  and  u  described  in  that 
section,  and 

(b)  Finding  a  suitable  starting  value  p  for  the  iterative  reweighted 
least  squares  (IRLS)  procedure. 

We  describe  first  the  expressions  for  D,  A  and  u  for  the  special  case  of 
M=1  covariates  with  k=4  categories.  The  extension  to  other  cases  is 
straightforward.  If  we  let 


’W  =  ^lr  r,i2* 

n21 ’  v22' ' ' 

••  ’W 

3tj 

where  n  =  Nx3,  then  D  =  -g s 

dp 

is  a  matrix  given  by 

/  FtBj.Xj) 

BjFtej.Xj) 

X1F(01 *X1 >  \ 

F(fl2,x1) 

02F^02'X1^ 

X1F( G2  *  X1 ) 

Dnx3  = 

1  F(83.x1) 

03F^03,Xl^ 

X1F(03’X1 ) 

F(01 ,x2^ 

01F(61x2> 

x2F^01’x2^ 

where 

l  F<e3xN> 

03F{03iXN) 

XNFt03’XN) 

...  .  _  ««p  (Pi-  g2VP3*i> 

**  \  “  4  •  X  .  )  — 
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Similarly,  u^j  =  is  given  by 


and  A _ =  F  ( - =■)  is  given  by 


- r  u 
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T,11*T,12 
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^21  (1722-t,21  ^ 


etc 


|  0  0 

with  tridiagonal  3x3  matrices  similar  to  the  one  given  above  along  the  main 


diagonal,  and  zeroes  elsewhere. 


8 


Each  of  A,  D  and  u  depends  on  the  unknown  parameters  g.  Given  an  initial 


value  for  P  we  can  evaluate  A.  D  and  u  and  commence  the  iterative  process. 

The  initial  values  can  be  obtained  by  noting  that  if  we  apply  a  logit 
transformation  to  y^j  in  equation  (10)  to  give 


then  these  logits  are  linear  in  the  parameters  g.  Initial  estimates  of  g  can 
be  obtained  by  constructing  empirical  logits. 


(15) 


and  performing  an  unweighted  least  squares  analysis  for  the  model  of  equation 
(14)  using  these  empirical  logits  as  the  dependent  variable. 

Two  points  should  be  noted  in  relation  to  the  estimation  of  initial 
values  for  p.  Firstly,  the  IRLS  procedure  is  quite  sensitive  to  the  choice  of 
initial  value,  (see  Green  (1984)),  and  a  poor  choice  can  lead  to 
non-convergence  of  the  algorithm.  Secondly,  to  obtain  initial  values  by  this 


procedure,  it  may  be  necessary  to  group  the  observations  into  categories  based 
on  values  of  the  independent  variables(s) .  If  the  data  are  not  so  grouped, 
(and  our  data  consists  essentially  of  multinomial  samples  of  size  1),  then  all 


y 


ij 


will  be  either  0  or  1  in  our  case  this  will  lead  to  all  the  empirical 


logits  having  value  either  In  3  or  -In  3. 


Although,  it  may  be  necessary  to  group  the  data  to  obtain  starting  values 


for  0,  the  maximum  likelihood  estimation  of  0  may  be  carried  out  for  either 
the  grouped  or  the  ungrouped  data. 

2.4  Computational  procedure. 

Since  a  stepwise  program  was  not  available,  the  model  given  by  equation  (10) 
was  estimated  separately  for  each  covariate.  Using  the  deviance  (the 
likelihood  ratio  statistic  against  the  saturated  model)  as  a  measure  of 
goodness  of  fit,  the  best  single  explanatory  variable  was  Xg.  Having  thus 
determined  the  optimal  single  variable  for  inclusion  in  the  model,  we  then 
proceeded  to  establish  which,  if  any,  variable  should  next  be  included  in  the 
model.  Due  to  the  non-availability  of  a  package  for  performing  this  analysis, 
the  computation  involved  was  cumbersome;  the  inclusion  of  an  additional 
variable  necessitated  the  re-programming  of  the  computations  leading  to  the 
matrices/vectors  A,  D  and  u.  It  was  determined  that  X£  was  the  next  variable 
which  should  be  included  in  the  model.  A  third  step  of  the  stepwise  procedure 
was  also  carried  out,  but  no  additional  variable  warranted  inclusion  in  the 
model . 

Accordingly,  in  evaluating  the  predictive  performance  of  the  model,  and 
in  comparing  this  performance  with  those  of  discriminant  analysis  and  multiple 

and  Xg. 

3.  Evaluation  of  the  predictive  performance  of  the  model. 

3.1  Introduction 

The  model  developed  in  this  paper  essentially  produces,  for  given  values 
of  the  covariates,  estimates  of  tj^,  the  cumulative  category  probabilities. 


regression,  we  used  only  the  explanatory  variables  Xg 


and  from  these  estimates  of  P^j>  the  actual  category  probabilities.  Hence, 
the  model  provides  probability  forecasts  of  the  four  storm  imminence 
categories,  for  a  given  meteorological  situation  (as  represented  by  the  values 
of  the  meteorological  variables).  These  probabilistic  forecasts  can  be  given 
directly  as  such,  or  may  be  converted,  by  methods  described  in  Section  3.2, 
into  categorical  forecasts. 

The  problem  of  evaluating  statistical  forecasts  of  this  type  has  been, 
and  continues  to  be,  a  topic  of  major  interest  in  meteorology.  In  Section 
3.3,  we  consider  two  very  simple  methods  of  evaluating  such  forecasts;  these 
two  methods  do  not  necessarily  lead  to  the  same  conclusions  in  relation  to  the 
relative  performance  of  the  various  forecasts. 

3.2  Use  of  the  Model  for  Forecasting 

We  consider  two  possibilities' 

(a)  -Given  the  estimated  probability  forecast,  a  categorical  forecast  can 
be  provided  by  forecasting  the  category  of  maximal  probability.  We  denote 
this  forecast  by  FI. 

(b)  The  model  described  by  equation  (10)  (or,  more  intuitively  by 

equation  (9)),  suggests  an  underlying  continuous  variable,  say  Z,  with  the 
explanatory  variable  falling  into  categories  1,2,3  or  4  accordingly  as  Z  i  0j. 
0j  <  Z  i  Qg*  2  £  0 Z  >  0 2*  respectively.  Given  values  of  the 

covarlates,  and  estimates  0  of  0,  we  can  estimate  Z  by  (in  the  notation  of 
equation  (10)) 


11 


(17) 


Z 


V  V^2 

m=l 


0, 


and  then  provide  a  categorical  prediction  that  the  storm  imminence  category  is 
1,  2.  3  or  4  according  as 

z  ^  8j  9j<  m  e2.  e2  <  z  $  e3.  z  >  e3 

This  forecast  is  denoted  by  F2. 

3.3  Evaluation  of  the  forecasts 

Since  the  two  forecasts  which  we  are  considering  here  are  categorical 
forecasts,  one  plausible  criterion  for  evaluating  these  forecasts  would  be  the 
number  of  correct  forecasts  achieved.  Different  forecasts  can  be  readily 
compared  using  this  measure.  Since  the  categories  being  forecast  are  ordered, 
an  incorrect  forecast  which  is  within  one  category  of  being  correct  is 
presumably  preferable  to  one  which  "misses”  by  two  or  more  categories.  Hence 
an  alternative  measure  of  performance  would  be  the  number  of  forecasts  which 
are  within  one  category  of  being  correct.  We  use  both  of  the  above  measures 
in  the  paper  to  compare  forecasts.  As  we  will  show,  the  different  criteria 
can,  in  some  cases,  lead  to  a  different  ranking  of  forecasts. 

In  estimating  the  predictive  performance  of  the  model  using  the  above 
measures,  we  omitted  each  data  point  in  turn,  estimated  the  model  parameters 
from  all  the  remaining  data,  and  then  used  the  forecast  procedures  FI  and  F2 
to  predict  the  category  of  the  omitted  data  point.  For  comparative  purposes. 


we  also  used  the  standard  techniques  of  discriminant  analysis  and  multiple 
regression  (using  in  the  latter  case  as  dependent  variable  the  coded  imminence 
of  tropical  storm  variable,  which  takes  values  1,  2.  3  and  4. 


4.  Discussion 

The  results  of  the  cross-validation  procedure,  described  in  Section  3.3, 
for  forecast  methods  FI  and  F2,  and  for  discriminant  analysis  and  multiple 
regression,  are  given  in  Tables  1,2,3  and  4  respectively.  An  overall  summary 
of  the  relative  performance  of  the  various  methods  is  presented  in  Table  5. 

We  would  emphasise  that  any  conclusions  drawn  here  in  relation  to  the  efficacy 
of  the  various  procedures  are  valid  only  in  relation  to  the  present 
application.  Broader  statements  about  the  general  performance  of  these 
methods  would  require  extensive  further  analysis. 

It  is  clear  from  Table  5  that  no  single  technique  is  clearly  superior. 
Using  the  criterion  of  maximising  the  number  of  correct  forecasts,  the 
generalized  linear  model  applied  in  this  paper,  with  forecasting  strategy  F2, 
is  the  best  among  those  considered.  However,  if  maximising  the  numbers  of 
forecasts  correct  within  one  category  is  chosen  as  the  comparative  criterion, 
then  multiple  regression  emerges  as  the  optimal  methodology. 

From  Table  5  it  is  clear  that  the  performance  of  discriminant  analysis 
is,  in  this  application  at  least,  somewhat  inferior  to  that  of  the  other 
techniques.  A  comparison  of  Tables  1  and  2  (which  use  the  same  model  but 
different  forecasting  strategies  based  on  the  model)  reveals  some  interesting 
facets  about  each  of  these  strategies.  Procedure  FI  always  forecasts  either 
category  1  or  category  4,  and  produces  the  highest  overall  percentage  of 
correct  forecasts.  Procedure  F2  is  less  extreme,  and  "smears"  ihe  category  1 


forecasts  over  categories  1  and  2.  and  the  category  4  forecasts  over 
categories  3  and  4;  this  has  the  effect  of  reducing  the  percentage  of  correct 
forecasts  but  increasing  the  percentage  of  forecasts  correct  to  within  one 
category.  Multiple  regression  "smears"  the  forecasts  still  further,  with  a 
resultant  decrease  in  the  percentage  of  correct  forecasts  and  Increase  in  the 
percentage  correct  to  within  one  category.  Multiple  regression  is.  in  fact, 
the  method  which  produces  the  highest  overall  percentage  of  forecasts  correct 
to  within  one  category. 

It  is  clear  that  the  choice  of  strategy  (forecasting  procedure)  among 
those  considered  and  described  here  will  be  greatly  influenced  by  the  relative 
importance/seriousness  of  the  various  correct/incorrect  forecasts.  This 
strongly  suggests  that  a  decision  theoretic  approach  might  be  considered, 
although,  in  practice,  the  specification  of  a  loss  function  may  be  difficult 
and  may  unduly  Influence  the  choice  of  strategy. 
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Table  1 

Forecast  Procedure:  FI 


Number  of  Forecasts 

True 

State 

Forecast  State 

1 

2 

3 

4 

To  tal 

1 

27 

0 

0 

34 

61 

2 

10 

0 

0 

32 

42 

3 

4 

0 

0 

20 

24 

4 

11 

0 

0 

445 

456 

total 

52 

0 

0 

531 

583 

Table  2 

Forecast  Procedure:  F2 


Number  of  Forecasts 

True 

Forecast  State 

State 

1 

2 

3 

4 

Total 

1 

18 

9 

7 

27 

61 

2 

8 

2 

5 

27 

42 

3 

1 

3 

3 

17 

24 

4 

4 

7 

17 

428 

456 

total 

31 

21 

32 

499 

583 

15 


Table  3 

Forecast  Procedure:  Discriminant  Analysis 


Number  of  Forecasts 


True 


State 

1 

1 

27 

2 

14 

3 

3 

4 

14 

total 

58 

Forecast  State 


2 

3 

4 

12 

16 

6 

5 

9 

14 

3 

14 

4 

26 

62 

354 

46 

101 

378 

Total 


61 


42 


24 


456 


583 


Table  4 

Forecast  Procedure:  Multiple  Regression 


Number  of  Forecasts 

True 

State 

Forecast  State 

1 

2 

3 

4 

Total 

1 

4 

21 

31 

5 

61 

2 

0 

11 

21 

10 

42 

3 

0 

2 

18 

4 

24 

4 

0 

11 

110 

335 

456 

total 


405 


180 


354 


583 


Table  5 


Forecast 

Procedure 

X  Correct 
Forecasts 

X  Forecasts 
Correct  with 
one  Category 

FI 

81 

86 

F2 

77 

88 

Discriminant 

Analysis 

69 

86 

Multiple 

Regression 

63 

90 
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